Efficient Inference in Markov Control Problems
نویسندگان
چکیده
Markov control algorithms that perform smooth, non-greedy updates of the policy have been shown to be very general and versatile, with policy gradient and Expectation Maximisation algorithms being particularly popular. For these algorithms, marginal inference of the reward weighted trajectory distribution is required to perform policy updates. We discuss a new exact inference algorithm for these marginals in the finite horizon case that is more efficient than the standard approach based on classical forwardbackward recursions. We also provide a principled extension to infinite horizon Markov Decision Problems that explicitly accounts for an infinite horizon. This extension provides a novel algorithm for both policy gradients and Expectation Maximisation in infinite horizon problems. 1 MARKOV DECISION PROBLEMS A Markov Decision Problem (MDP) is described by an initial state distribution p1(s1), transition distributions p(st+1|st, at) and reward function Rt(st, at), where the state and action at time t are denoted by st and at respectively (Sutton and Barto, 1998). The state and action spaces can be either discrete or continuous. For a discount factor γ the reward is defined as Rt(st, at) = γR(st, at) for a stationary reward R(st, at), where γ ∈ [0, 1). We assume a stationary policy, π, defined as a set of conditional distributions over the action space, πa,s = p(at = a|st = s, π). The total expected reward of the MDP (the policy utility) To avoid cumbersome notation we also use the notation zt = {st, at} to denote a state-action pair. We use the bold typeface, zt, to denote a vector. is given by
منابع مشابه
Efficient Markov Logic Inference for Natural Language Semantics
Using Markov logic to integrate logical and distributional information in natural-language semantics results in complex inference problems involving long, complicated formulae. Current inference methods for Markov logic are ineffective on such problems. To address this problem, we propose a new inference algorithm based on SampleSearch that computes probabilities of complete formulae rather tha...
متن کاملEfficient Sampling for Gaussian Process Inference using Control Variables
Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, t...
متن کاملThe Segmented iHMM: A Simple, Efficient Hierarchical Infinite HMM
We propose the segmented iHMM (siHMM), a hierarchical infinite hidden Markov model (iHMM) that supports a simple, efficient inference scheme. The siHMM is well suited to segmentation problems, where the goal is to identify points at which a time series transitions from one relatively stable regime to a new regime. Conventional iHMMs often struggle with such problems, since they have no mechanis...
متن کاملStacked Graphical Learning: Learning in Markov Random Fields using Very Short Inhomogeneous Markov Chains
We described stacked graphical learning, a meta-learning scheme in which a base learner is augmented by expanding one instance’s features with predictions on other related instances. The stacked graphical learning is efficient, especially during inference, capable of capturing dependencies easily, and can be constructed based on any kind of base learner. In experiments on two classification pro...
متن کاملGlobal optimization using the asymptotically independent Markov sampling method
In this paper, we introduce a new efficient stochastic simulation method, AIMS-OPT, for approximating the set of globally optimal solutions when solving optimization problems such as optimal performance-based design problems. This method is based on Asymptotically Independent Markov Sampling (AIMS), a recently developed advanced simulation scheme originally proposed for Bayesian inference. This...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011